19 July 2016

Why Studying the Taxi Data of NYC?

  • The combination of ~24Gb, 140M YELLOW taxi rides and ~ 3Gb, 19M GREEN taxi rides in 2015 prepares us to deal with big data sets.
  • The longitude, latittude data recorded from each taxi's GPS at the start and the destination of each taxi ride, capturing the geo-spatial distribution of the taxi rides.

  • The time stamps of the taxi rides capture the New Yorkers' very dynamical commuting activities throughout the years, the months, the weeks and the hours.

  • The tips, taxi-fares paid for the rides capture the passenger behaviorial information in responding NYC's traffic conditions.

To Tip or Not to Tip, That's the Question!

The Tipping Habbit of the NYC Taxi Passengers

The goal of this talk is to explore the tipping behavior of New Yorkers through both the Yellow Taxi, Green taxi ride data in 2015.

Tips are the proxy for the passengers' satisfaction on their rides

Our main characters, YELLOW cabs and its little cousin GREEN cabs.

Why There Is A Need for the GREEN Cabs (Boroughs Taxi)?

  • The GREEN taxi origination map is on the left.

  • GREEN taxis are not allowed to pick up passengers within the so-called 'yellow zone', below the E 96th street and W 110th street.

  • Both maps are obtained by sampling the original data, to be plotted by ggplot

The Destination Plots

  • The green taxis destinate to a wider range than their origination, including the north manhattan and La Guardia airport. But the yellow taxi prefers to stay within the lower Manhattan 'yellow zone'.

Taxis Rides, Week Days vs Hours 2D Heat Map

  • GREEN taxi rides is on the left
  • The Hours mark is montonically increasing on the y-axis

Taxis Speed in a Week Days vs Hours 2D Heat Map

  • GREEN taxi rides is on the left.
  • The Hours mark is montonically increasing on the y-axis

Taxis Tip % in a Wdays vs Hours 2D Heat Map

-The 2D heat maps provide us important insights upon the passengers' tipping behaviors. But we need more powerful analysis to understand the customers tipping behaviors.

Taxi Tip Percentages and Speeds (Green=Left)

Both graphs display interesting reverted 'V' shaped patterns, indicating that the tip percents peak near 10 mph. This is the 'consensus' speed most passengers are O.K. with.

Taxi Rides Grouped by the Speeds

The green taxi rides peak above 10 mph, but yellow taxi rides peak below 10 mph. The yellow taxi rides plot has a 'fatter' distribution, indicating that its variance on speed is larger.

Taxi Ratio of Passengers with No Tips

  • Both graphs display 'U' shape decreasing, only to bebound above the 'consensus' speeds. The optimal range for the green cabs is approximately = [15,20], and it is approximately = [7.5,12.5] for yellow cabs.
  • For yellow cab passengers, the rebound is weak and stablize above 30 mph.

Tip Percentages for those who pay

- Green taxi riders tip slow taxi drivers with high percentages, but it decreases as the speed improves.

  • Yellow taxi riders have totally different tipping behavior. The tips peak at about 20 mph and stay in a closed range.

We plot the above behaviors for different time frames to compare the tip percents varying with speeds

  • Green taxi riders are relatively indifferent w.r.t. speeds during the morning rush hours but become very speed-sensitive at all the other time-frames, with very steep reverted 'V' shaped responses.
  • Yellow taxi riders mostly do not display these strong tipping-speed dependences. The 4-7pm group even has a monotonically increasing tipping behavior.

We plot the durations (minutes) of each time frame

And the distances (miles)

Conclusion

  • New York Taxi data contains a wealth of geo-spatial, local traffic information.
  • Decoding this information is an interesting exercise in BIG DATA.
  • By studying the passengers' tipping behavior, we understand the commuting needs of New Yorkers better.
  • We identify speed, trip duration and time frame (early morning, morning rush hours ….) to dictate the way they feel satisfied/unsatisfied toward their drivers and reward them accordingly.